This is going to be a basic overview of making your some basic plots in ggplot. We will cover

You should think of this as a “Best hits of intro to ggplot”. I have gone through and collected material that I have found to be the most helpful when learning ggplot. All of these will linked as we go through.If you want any more information on a particular topic, those would be great places to start.

First step will be to load all the libaries we might need. Make sure these are installed (if you don’t know how to install packages look here)

It’s always a good idea to load tidyverse, that way if you need to clean up any data before plotting you’ll be good to go. here is a helpful package for loading and saving files. You may not need to use it here. You need to download ggplot, I don’t think we need to explain why. Now, patchwork is very exciting and I will show you exactly what it does later.

library(tidyverse)
library(here)
library(ggplot2)
library(patchwork)

A few notes on ggplot

## Basic Anatomy of a ggplot

p=ggplot(aes(aes1,aes2))+ #these are global aesthetics that will apply to all the points (required)
    geom_X(aes(aes1,aes2))+ #X=point|bar|violin|etc, you can have many `geom`s in one plot (required)
    theme() # a lot of your specifications will go here (not required)
  
p #this is how you get your plot to show up 

You could also just get the plot to show up automatically if you don’t set it to an object

ggplot(aes(aes1,aes2))+ 
    geom_X(aes(aes1,aes2))+ 
    theme()

Scatter Plots

Scatterplots are an excellent first plot to start off with. There are lots of ways to manipulate scatterplots to give very informative figures-which you will see farther down on this page.

The data and further information on making scatterplots can be found here.

First thing first, load the data. What I have written in this chunk may not work for you. You may have to do something along the lines of scatter=read.csv(file.choose()) and then select the scatter.csv from wherever you saved it on your computer.

Its always a good idea to look at the data and make sure it uploaded properly before you start plotting. This also makes sure you know what the column names are.

scatter=read.csv(here("data/scatter.csv"))%>%dplyr::select(-X)
head(scatter)

Basic Scatter Pot

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point()

Scatter plot with trendline

Sometimes you want to add a trendline. Since this dataset is not going to give a nice linear trendline as is we are tweaking it a bit by taking the log of the gdpPercap. This is very easy and is just done by adding the log() around our X variable.

for more information on how to get a line of best fit see the documentation for geom_smooth

ggplot(scatter,aes(x=log(gdpPercap),y=lifeExp))+
  geom_point()+
  geom_smooth(method="lm")

Scatter plot with different aes for the points

When you have a lot of nice metadata associated with the variables you are plotting. It is nice to incorporate these into your figures. You can normally change

  • Shape
  • Colour
  • Fill
  • Alpha
  • Size

Note: We have taken the log() away from the X value.

Let’s change the colour of the points based on contient and scale the points based on pop

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))

Lets change the shape of the points

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(shape=continent,size=pop))

You can see we get a lot of strange shapes. You can specify the shapes you want by using a number code. You can find all those here

Whats the difference between fill and colour? Also why are there different versions of the same shape?

Bar Charts and Violin Plots

What I have written in this chunk may not work for you. You may have to do something along the lines of bar=read.csv(file.choose()) and then select the bar_plot.csv from wherever you saved it on your computer.

The dataset we will be using is looking at how much sleep students in different years of school get (in minutes) at different times in the year.

Bar Chart

More information on making bar charts can be found here

bar=read.csv(here("data/bar_plot.csv"))
head(bar)

There are a lot of ways you can display a bar chart. It’s very easy to switch between them.

All of these will have stat="summary" and fun.y="mean" in the geom_bar(), this is how we can make sure we are plotting the means of each category.

Grouped Bar Chart

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat = "summary", fun.y = "mean")

Stacked Bar Chart

Changing the position will allow us to have different types of bar charts. To get it stacked, use position="stack".

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="stack",stat = "summary", fun.y = "mean")

Percent Bar Chart

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="fill",stat = "summary", fun.y = "mean")

To summarize: grouped bar chart: position="dodge" stacked bar chart: position="stack" percent bar chart: position="fill"

Box and Whisker Plot

Box and Whisket plots are considered an improvement over the barplot because they give a better idea of the spread of the data. You can see the mean, quartiles and outliers, these are not evident with the bar plot.

More information on making box and whisker plots can be found here

ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_boxplot()

Violin Plot

Violin plots are sometimes considered anoter level up from the box and whisker (so to keep track bar<box<violin) since it gives a better (more visual) idea of how the points are distributed.

More information on violin plots can be found here

ggplot(bar, aes(x=time,y=minutes,fill=year))+
  geom_violin()

You may have noticed that for all of these, the axis are in the order final, midterm, week2. While not a big deal, it would be nice if they were week2, midterm, final. We are going to get into how to change that later. For now, we will stick to the basic plots.

Density Plots

What I have written in this chunk may not work for you. You may have to do something along the lines of density=read.csv(file.choose()) and then select the density.csv from wherever you saved it on your computer.

More information on density plots can be found here

This is a randomly generated dataset for the weights of males vs. females.

density=read.csv(here("data/denisty.csv"))
head(density)
ggplot(density, aes(x=weight,fill=sex)) + 
  geom_density()

There is a large chunk of this figure that is overlapping. If we want to be able to see what is going on, we can change the alpha (or opaque the figures are)

ggplot(density, aes(x=weight,fill=sex)) + 
  geom_density(alpha=0.5)

Customizing your Plots

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  theme_bw()

scatter_bw=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle(("theme_bw"))+
  theme_bw()+
  theme(legend.position = "none")
scatter_light=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_classic")+
  theme_classic()+
  theme(legend.position = "none")
scatter_dark=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_dark")+
  theme_dark()+
  theme(legend.position = "none")
scatter_minimal=ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  ggtitle("theme_minimal")+
  theme_minimal()+
  theme(legend.position = "none")

Patchwork

(scatter_bw|scatter_light)/(scatter_dark|scatter_minimal)

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  theme_bw()+
  theme(panel.grid = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  ggtitle("Scatterplot with Viridis Colouring")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_brewer(palette = "Paired")+
  ggtitle("Scatterplot with Colour Brewer")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

If you’re going to get into manual colouring look here

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_manual(values=c("orangered3","slateblue","lightseagreen","orchid3","sienna2"))+
  ggtitle("Scatterplot with Manual Colours")+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank())+
  facet_grid(~continent)

ggplot(scatter,aes(x=gdpPercap,y=lifeExp))+
  geom_point(aes(colour=continent,size=pop))+
  scale_colour_viridis_d()+
  xlab(" GDP Per Capita")+
  ylab("Life Expectancy")+
  guides(colour=guide_legend(title="Continent"),size=guide_legend(title="Population"))+
  theme_bw()+
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text.x = element_text(angle=45,vjust=0.5))+
  facet_grid(~continent)

Changing Factor Levels

bar_ofl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="identity")
bar$time <- factor(bar$time,levels = c("week2","midterm","final"))
bar_rfl=ggplot(bar, aes(x=time,y=minutes,fill=year))+geom_bar(position="dodge",stat="identity")
(bar_ofl|bar_rfl)

enzyme_isoform, diameter, isoform_count, ome, taxa,enzyme_parent_ASA

## Warning: Removed 477 rows containing missing values (geom_point).

library(igraph)
library(ggnetwork)

Trouble shooting